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A  High-Level  Design  for  Pan 

Robert  A-  Bollaace  i 
June,  1085 

1.  Overview 

Pen  i*  e  nultilinsuel  Ienguese>bMe<l  editor  for  menipuJeting  tree>structured  documents.  The 
editor  supports  both  tree*  ead  text-oriented  f^er^ions.  The  aq>ected  use  of  this  system  is  is  the 
front-end  for  e  development  environment  in  which  o^rienced  developers  use  severe!  lengueges 
while  cresting  e  complex  progrsm  or  other  document.  One  tesk  of  the  front-end  is  to  gather  end 
make  available  information  about  the  document  for  use  bj  the  developers  and  bj  other  tools. 

Multiple  languages  are  handled  bj  separating  the  language-specific  information  from  the 
feneric  utilities  supplied  bj  the  editor.  Language-specific  information,  in  the  form  of  a  language 
description,  is  preprocessed  into  tables  for  use  hf  the  editor.  The  editing  component  itself  is  table- 
driven.  New  languages  can  be  added  to  the  system  by  creatmg  and  loading  a  new  set  of  tables. 

Pan  is  designed  to  handle  different  languages  in  different  editing  workspaces;  switching  workspaces 
within  an  editing  session  allows  the  user  to  edit  different  languages.  ^ ^ 

There  are  two  major  components  to  the  Pan  system:  the  editor  and  the  table  generators  The 
editor  supplies  editing  creations  while  checking  that  the  document  meets  the  requirements  of  the 
language  in  which  it  is  written.  These  requirements  fall  into  three  categories:  lexical,  syntactic, 
and  contextual.  (Contextual  requirements  are  often  called  the  'static  semantics*  of  a  language.) 
iniormation  concerning  man  or  inconsistencies  in  the  document  is  communicated  to  the  user 
during  the  course  of  editing. 

The  editor  uses  both  the  concrete  representation  of  a  document  (the  representation  as  seen  by 
a  user  of  the  system)  and  the  abstract  syntax  of  the  document  to  implement  its  editing  operations. 

The  correspondence  between  the  two  representations  is  maintmned  by  an  incremental  scanning 
and  parsing  system.  The  abstract  syntax  is  in  the  form  of  an  operator/phylum  tree[4].  Contextual 
constraints  arc  enforced  using  only  the  abstract  syntax.  Other  tools  in  the  environment  may  add  ' 

mformation  to  the  intemal  tree  representation;  it  is  the  structure  of  the  tree  which  is  of  primary  ^ 

interest  to  tl|e  editor.  i 

The  table  generator  takes  a  language  description,  checks  it  for  consistency  and  for  the  prop-  j 

erties  required  by  the  algorithms  used  in  Pan,  and  then  generates  the  tables  used  by  the  editor.  | 

b  fact,  the  table  generator  is  a  collection  of  tools,  many  of  which  already  exist  in  the  UNIX* 
programming  environment.  ^ 

3.  Tha  Editor  □ 

The  editor  bchides  the  basic  utilities,  a  genery-purposc  text  editor,  a  display  manager,  and  - . - . — | 

components  for  incremental  lexical  analysis,  incremental  parsing,  and  incremental  contextual  con- 
stramt  checkmg.  The  user  interface  of  the  editor  makes  use  of  the  workstation  environment  by 
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rapportinf  moufe/memi  intenctioo  alons  with  keyboard  interaction.  Multiple  windows  into  dif¬ 
ferent  edit  workspaces  are  anticipated. 

The  basic  utilities  supply  editing  workspaces  (buffers),  interactions  with  long-term  storage, 
undo  processing,  a  help  system,  and  command  definition  facilities.  The  editor  is  extensible  and 
customisable  while  integrating  mouse  and  menu  forms  of  interaction.  The  design  allows  the  user 
to  override  most  system  configurations  with  workspace-local  customisations  affecting  the  contents 
of  menus,  display  options,  and  bindings  of  commands  to  key  stroke  sequences.  Initially,  documents 
are  being  stored  as  specially  formatted  files  in  the  UNIX  file  system. 

The  ability  to  undo  commands  is  proeided  by  a  set  of  eoneentions  for  communicating  infor¬ 
mation  to  an  undo  procemor.  The  undo  processor  is  meoked  by  the  *Undo*  command.  This 
design  allows  different  undo  strategies  to  be  investigated.  The  initial  strategy  to  be  implemented 
simply  restores  the  workspace  to  the  state  prior  to  the  command;  osily  the  most  recent  command 
is  undoable. 

fhmnamd  idcBni&m  facSticB  allow  new  commssids  to  be  added  to  the  editor.  Users  are  free 
to  develop  their  own  lihiaries  of  commands,  nie  definition  facilities  include  procedure  definition, 
accumulation  of  help  and  undo  information,  and  specification  of  initial  bindings.  A  general  help 
facility  is  provided  as  a  part  of  the  editor. 

A  general-purpose  text  editor  is  included  in  the  design.  The  text  editor  operates  on  an  active 
region  of  text  which  could  be  the  entire  document.  In  this  case,  the  entire  system  functions  as  a 
display-oriented  text  editor. 

Operations  upon  text  are  themselves  implemented  as  operations  on  "text-regions”;  the  elemen¬ 
tary  editing  operations  are  ‘Insert-Region*  and  "Delete- Region*.  Character-oriented  <q>eration8 
are  modeled  as  operations  on  single-character  regions,  hrtemally,  text-regions  are  implemented 
using  linked  lists  of  contiguous  arrays  of  characters.  This  representation  permits  one  to  designate  a 
particular  character  in  the  region  without  updating  the  designation  until  it  is  used.  For  this  reason, 
the  designation  is  called  a  sticky-pointer[2];  the  pomter  sticks  to  its  designation.  Sticky  pointers 
are  used  to  implement  text-regions,  to  implement  undo  operations,  and  to  maintain  the  mapping 
between  tokens  and  the  textual  representation  when  necessary. 

The  display  manager  and  the  user  interface  take  advantage  of  bit-m^>ped  graphics  and  a 
mouse,  hutfally,  the  display  manager  will  be  text-based,  with  the  intention  of  substituting  an  object- 
oriented  display  manager  at  a  later  time.  The  user  interface  is  itself  defined  uung  the  extension 
facilities  of  the  system,  allowing  interface  designers  the  ability  to  experiment  with  alternative 
dialogues  between  system  and  user. 

bcremental  lexical  analysis  maps  a  textual  representation  to  the  basic  lexical  units  of  the 
language  under  consideration.  Pan  provides  a  generic  interface  for  communicating  with  lexical 
analysers.  The  actual  code  for  detecting  lexical  units  can  cither  be  siqiplied  as  a  hand-written 
analyser  or  as  a  q>ecification  for  the  fes[S]  kxical  analyser  pnerator.  The  result  of  bcremental 
lexical  analysis  is  a  list  of  tokens  together  with  information  as  to  how  the  token  sequence  has 
dianged.  This  information  is  used  by  the  bcremental  parsbg  algorithm  to  reparse  the  affected 
area.  TiAens  not  relevant  to  the  parser  arc  screened  out  by  the  bcremental  lexical  analyser.  These 
are  placed  on  b  a  separate  data  structure  where  they  can  be  attached  to  nodes  b  the  abstract 


■yatas  tree. 

Hie  incremeDtal  parti&s  algorithm  allows  changes  in  the  external  representatitm  (the  text) 
to  be  reflected  in  the  intemal  representation  (the  abstract  syntax  tree).  The  algorithm  uses  a 
bottom*iq>  (LALR)  parser  to  perform  the  actual  parsing. 

In  order  to  parse  a  change  incrementally,  the  state  of  the  parser  (at  any  p<^t)  must  be  tecoe- 
arable.  When  certain  relationships  hold  between  the  abstract  syntax,  the  external  representation 
of  the  abstract  syntax,  and  the  grammar  used  to  parse  the  external  representation,  the  state  of  the 
parser  can  be  recorered  from  the  abstract  syntax  tree,  while  the  abstract  syntax  tree  can  be  derired 
directly  from  the  parse  tree.  These  relationships  can  be  checked  at  table  generation  time,  so  that 
the  actual  conrersions  amount  to  simple  table  lookups.  The  actual  parse  tree  is  nerer  explicitly 
represented.  This  u  important  because  the  tree  that  b  generated  directly  frrom  a  bottom-up  parser 
is  generally  much  larger  and  more  complicated  than  an  abstract  syntax  tree.  The  incremental 
parsing  algorithm  of  Jalili  and  <3allier(S]  has  been  chosen  for  Pan.  It  can  be  extended  easily  to 
perform  the  necessaiy  transformations. 

Error  recovery  during  parsing  will  use  the  panic  mode  method  proposed  by  Robert  Cesbett 
in  (I).  When  an  error  b  detected,  the  recovery  algorithm  isolates  the  affected  area  and  continues. 
The  tokens  bolated  during  recovery,  together  with  a  message  describing  the  error,  will  be  attached 
to  an  ‘error  node*  in  the  abstract  syntax  tree.  The  user  will  then  be  able  to  select  that  node  in 
order  to  see  the  meuage. 

Pan  provides  a  general  notion  of  'attachmex^*  to  nodes  m  the  abstract  syntax.  Some  tokens, 
such  as  comments,  are  in  fact  attached  to  nodes  as  system-known  properties.  Operations  to  add  or 
delete  attachments  are  included  in  the  repertoire  of  editing  operations.  Other  tools  can  attach  other 
information,  provided  that  they  ignore  any  attachments  that  they  don’t  recognise.  Attachments 
themselves  may  be  either  "prefix*  or  "postfix”.  To  attach  lexical  items  such  as  comments,  the 
lexical  item  b  declared  as  an  attachment.  When  encountered  in  the  token  stream,  the  tdien  b 
either  attached  (postfix)  to  the  top  node  in  the  tree-building  stack,  or  b  placed  on  a  separate 
stadc.  As  nodes  in  the  abstract  syntax  tree  are  created,  the  attachment  stack  b  consulted,  and 
if  the  created  node  accepts  the  kind  of  attachment  on  the  stack,  the  connection  is  made  and  the 
attachment  stack  popped. 

Using  an  operator/phylum  tree  allows  the  actual  structure  of  the  intemal  tree  to  be  ludden 
from  the  user.  Each  node  in  the  tree  b  represents  an  operator  in  the  abstract  syntax;  groups  of 
operators  are  called  phyla.  Commands  which  traverse  trees  can  be  defined  m  terms  of  either  the 
operators  or  the  phyla,  such  as  "Next-Function”,  instead  of  being  defined  solely  in  terms  of  the 
tree  structure  ("Next-Sibling”). 

Contextual  constraints  are  specified  and  checked  using  a  new  method  modeled  on  logic  pro¬ 
gramming.  In  thb  model,  a  database  of  information  b  built  up  during  editing;  the  information  in 
the  database  can  then  be  consulted  by  the  contextual  constraints  during  constraint  enforcement. 
A  dcKription  of  contextual  constraints  ineludes  the  global  (context  independent)  axioms  of  a  lan¬ 
guage,  the  definitions  of  facts  to  be  entered  in  the  database,  and  the  actual  constrrats.  Both  fact 
definitions  and  constraints  are  attached  to  the  rules  in  the  abstract  syntax. 

The  database  itself  b  logically  structured  to  reflect  the  naming  rules  of  a  language— in  a 
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profnminins  lassuftse,  these  are  the  scope  rules.  The  eonstr^t  checker  ensures  that  the  database 
•truclurc  is  up  to  date  before  evaluating  other  constraints.  As  facts  are  added  and  deleted  from 
the  database,  a  dependencj  tracking  medbanism  will  ensure  consistencj.  This  model  relieves  the 
author  of  a  language  description  from  «q>Iicitl7  defining  dependencies  as  u  the  case  with  attribute 
grammars.  In  addition,  a  user  of  the  ^stem  (or  other  tools,  including  the  editor  itself)  can  access 
the  database  to  get  information  about  the  document. 

t.  Language  Dascrlptlona 

Tb  add  a  new  language  to  the  repertoire  oi  languages  know  bj  Pan,  one  must  provide  infor¬ 
mation  about  the  lexical,  syntactic,  and  contextual  structures  of  the  language.  This  information 
is  gathered  together  as  a  language  description.  The  description  has  separate  parts  for  each  of  the 
above  aspects,  phis  parts  for  information  about  external  representations  and  pretty  printing. 

nalmnltfuuiiAiont^  a  hnguage  can  be  either  a  lex-like  specification  (whidi  will  be  pro¬ 
cessed  by  lex)  or  the  designation  of  a  procedure.  In  the  latter  case,  a  hand-coded  lexical  analyser 
is  bring  siqiplied.  Associated  with  the  token  definitions  is  such  information  as  whether  the  token 
is  to  be  screened  from  the  parser.  Also  provided  are  standard  routines  for  detecting  lexical  items 
not  easily  specifiable  by  regular  expression  such  as  nested  setpiences  of  brackets. 

The  syntactic  description  has  three  subparts:  the  abstract  syntax,  the  external  representation, 
and  the  grammar  to  use  for  generating  a  parser.  This  latter  grammar  will  be  passed  to  a  parser 
generator  to  create  the  actual  parse  tables.  The  relationships  among  those  three  descriptions 
required  for  incremental  parsing  will  be  enforced  prior  to  table  generation. 

The  contextual  coiutrrint  definition  consists  of  clauses  attached  to  rules  in  the  abstract  syntax, 
of  axioms  independent  of  the  syntax,  and  of  other  information  required  by  the  evaluator. 

4.  Implcmentatioii 

Pan  will  be  implemented  on  a  SUN  workstation*.  The  primary  implementation  language  will 
be  LISP,  with  recourse  to  C  for  low  level  routines  and  access  to  the  screen. 
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